An intrusion detection system collects and analyzes information from different areas within a\ncomputer or a network to identify possible security threats that include threats from both outside\nas well as inside of the organization. It deals with large amount of data, which contains various irrelevant\nand redundant features and results in increased processing time and low detection rate.\nTherefore, feature selection should be treated as an indispensable pre-processing step to improve\nthe overall system performance significantly while mining on huge datasets. In this context, in this\npaper, we focus on a two-step approach of feature selection based on Random Forest. The first\nstep selects the features with higher variable importance score and guides the initialization of\nsearch process for the second step whose outputs the final feature subset for classification and interpretation.\nThe effectiveness of this algorithm is demonstrated on KDD�99 intrusion detection\ndatasets, which are based on DARPA 98 dataset, provides labeled data for researchers working in\nthe field of intrusion detection. The important deficiency in the KDD�99 data set is the huge number\nof redundant records as observed earlier. Therefore, we have derived a data set RRE-KDD by\neliminating redundant record from KDD�99 train and test dataset, so the classifiers and feature\nselection method will not be biased towards more frequent records. This RRE-KDD consists of\nboth KDD99Train+ and KDD99Test+ dataset for training and testing purposes, respectively. The\nexperimental results show that the Random Forest based proposed approach can select most important\nand relevant features useful for classification, which, in turn, reduces not only the number\nof input features and time but also increases the classification accuracy.
Loading....